A Parallel Implementation of K-Means Clustering on GPUs
نویسندگان
چکیده
Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU threads and SIMD processing elements, the principles involved in dividing the steps of a parallel algorithm for execution on GPU architectures remains a significant challenge. In this paper, we introduce a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions. Clustering algorithms are important for search, data mining, spam and intrusion detection applications. Modern desktop machines commonly include desktop search software that can be greatly enhanced by these advances, while low-power machines such as laptops can reduce power consumption by utilizing the video chip for these clustering and indexing operations. Our preliminary results show over a 13x performance improvement compared to a baseline 3 GHz Intel Pentium(R) based PC running the same algorithm with an average spec G80 graphics card, the NVIDIA 8600GT. The low cost of these video cards (less than $100 market price as of 2008), and the high performance gains suggest that our approach is both practical and economical for common applications.
منابع مشابه
High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملComparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملScalable Data Clustering Using Fermi GPUs on FutureGrid
The applications in science are creating huge amount of data sets. These data sets need to be classified into subsets in order to draw some meaningful conclusions. Data clustering is the statistical analysis process that groups similar objects into relatively homogeneous sets which are called clusters. The computational demands of data clustering grow rapidly. And it is very time consuming for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008